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Abstract 



a,. 

D ' This paper is motivated by the problem of error control in network coding when errors are introduced 

in a random fashion (rather than chosen by an adversary). An additive-multiplicative matrix channel is 
considered as a model for random network coding. The model assumes that n packets of length m are 
transmitted over the network, and up to t erroneous packets are randomly chosen and injected into the 
network. Upper and lower bounds on capacity are obtained for any channel parameters, and asymptotic 

fi I expressions are provided in the limit of large field or matrix size. A simple coding scheme is presented 

that achieves capacity in both limiting cases. The scheme has decoding complexity 0{n?m) and a 

^ . probability of error that decreases exponentially both in the packet length and in the field size in bits. 

(N 

m 



Extensions of these results for coherent network coding are also presented. 



^ Index Terms 

o. 

OO . Error correction, error trapping, matrix channels, network coding, one-shot codes, probabilistic error 



model. 



I. Introduction 

Linear network coding [l]-[3] is a promising new approach to information dissemination over networks. 
The fact that packets may be linearly combined at intermediate nodes affords, in many useful scenarios, 
higher rates than conventional routing approaches. If the linear combinations are chosen in a random, 
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distributed fashion, then random Unear network coding [4] not only maintains most of the benefits of 
Unear network coding, but also affords a remarkable simplicity of design that is practically very appealing. 
However, linear network coding has the intrinsic drawback of being extremely sensitive to error 
propagation. Due to packet mixing, a single corrupt packet has the potential to contaminate all packets 
received by a destination node. The problem is better understood by looking at a matrix model for 
(single-source) linear network coding, given by 

Y = AX + DZ. (1) 

All matrices are over a finite field. Here, X is an nxm matrix whose rows are packets transmitted by the 
source node, Y is an N xm matrix whose rows are the packets received by a (specific) destination node, 
and Z is a t X m matrix whose rows are the additive error packets injected at some network links. The 
matrices A and D are transfer matrices that describe the linear transformations incurred by packets on 
route to the destination. Such linear transformations are responsible for the (unconventional) phenomenon 
of error propagation. 

There has been an increasing amount of research on error control for network coding, with results 
naturally depending on the specific channel model used, i.e., the joint statistics of A, D and Z given X. 
Under a worst-case (or adversarial) error model, the work in [5], [6] (together with [7]-[10]) has obtained 
the maximum achievable rate for a wide range of conditions. If A is square (N = n) and nonsingular, 
and m > n, then the maximum information rate that can be achieved in a single use of the channel is 
exactly n — 2t packets when A is known at the receiver, and approximately ^^^^— ^(n — 2t) packets when 
A is unknown. These approaches are inherently pessimistic and share many similarities with classical 
coding theory. 

Recently, Montanari and Urbanke [11] brought the problem to the realm of information theory by 
considering a probabilistic error model. Their model assumes, as above, that A is invertible and m> n; 
in addition, they assume that the matrix DZ is chosen uniformly at random among all n x m matrices 
of rank t. For such a model and, under the assumption that the transmitted matrix X must contain an 
nxn identity submatrix as a header, they compute the maximal mutual information in the limit of large 
matrix size — approximately ™'~"~* (n — t) packets per channel use. They also present an iterative coding 
scheme with decoding complexity 0{n^m) that asymptotically achieves this rate. 

The present paper is motivated by [11], and by the challenge of computing or approximating the actual 
channel capacity (i.e., without any prior assumption on the input distribution) for any channel parameters 
(i.e., not necessarily in the limit of large matrix size). Our contributions can be summarized as follows: 



• Assuming that the matrix ^ is a constant known to the receiver, we compute the exact channel 
capacity for any channel parameters. We also present a simple coding scheme that asymptotically 
achieves capacity in the limit of large field or matrix size. 

• Assuming that the matrix A is chosen uniformly at random among all nonsingular matrices, we 
compute upper and lower bounds on the channel capacity for any channel parameters. These bounds 
are shown to converge asymptotically in the limit of large field or matrix size. We also present a 
simple coding scheme that asymptotically achieves capacity in both limiting cases. The scheme has 
decoding complexity 0{n?m) and a probability of error that decays exponentially fast both in the 
packet length and in the field size in bits. 

• We present several extensions of our results for situations where the matrices A, D and Z may be 
chosen according to more general probability distributions. 

A main assumption that underlies this paper (even the extensions mentioned above) is that the transfer 
matrix A is always invertible. One might question whether this assumption is realistic for actual network 
coding systems. For instance, if the field size is small, then random network coding may not produce 
a nonsingular A with high probability. We believe, however, that removing this assumption complicates 
the analysis without offering much insight. Under an end-to-end coding (or layered) approach, there is a 
clear separation between the network coding protocol — which induces a matrix channel — and the error 
control techniques applied at the source and destination nodes. In this case, it is reasonable to assume 
that network coding system will be designed to he. feasible (i.e., able to deliver X to all destinations) 
when no errors occur in the network. Indeed, a main premise of linear network coding is that the field 
size is sufficiently large in order to allow a feasible network code. Thus, the results of this paper may 
be seen as conditional on the network coding layer being successful in its task. 

The remainder of this paper is organized as follows. In Section |lll we provide general considerations 
on the type of channels studied in this paper. In Section |llll we address a special case of ([T]) where 
A is random and t = 0, which may be seen as a model for random network coding without errors. In 
Section JVl we address a special case of ([T]) where A is the identity matrix. This channel may be seen as 
a model for network coding with errors when A is known at the receiver, since the receiver can always 
compute A^^Y . The complete channel with a random, unknown A is addressed Section |Vl where we 
make crucial use of the results and intuition developed in the previous sections. Section |Vl] discusses 
possible extensions of our results, and Section IVlI] presents our conclusions. 

We will make use of the following notation. Let ¥q be the finite field with q elements. We use F"^™ 
to denote the set of all n x m matrices over ¥q and Tnxm,t{^ q) to denote the set of all n x ?n, matrices of 



rank t over ¥q. We shall write simply Tnxm,t = %2xm,t{^q) when the field ¥q is clear from the context. 
We also use the notation %ixm = %ixm,mm{n,m} for the set of all full-rank n x m matrices. The n x m 
all-zero matrix and the n x n identity matrix are denoted by Onxm and /„xn> respectively, where the 
subscripts may be omitted when there is no risk of confusion. The reduced row echelon (RRE) form of 
a matrix M will be denoted by RRE (M). 

II. Matrix Channels 

For clarity and consistency of notation, we recall a few definitions from information theory [12]. 

A discrete channel {X,y,pY\x) consists of an input alphabet X, an output alphabet y, and a con- 
ditional probability distribution Py\x relating the channel input X £ A^ and the channel output Y £ y. 
An {M,£) code for a channel {'^,y,PY\x) consists of an encoding function {1, . . . ,M} -^ X^ and a 
decoding function y^ -^ {!,••• j^f, /}> where / denotes a decoding failure. It is understood that an 
{M,i) code is applied to the Ah extension of the discrete memoryless channel {A^,y,pY\x)- A rate 
R (in bits) is said to be achievable if there exists a sequence of ( [2^^] , £) codes such that decoding 
is unsuccessful (either an error or a failure occurs) with probability arbitrarily small as i ^ oo. The 
capacity of the channel is the supremum of all achievable rates. It is well-known that the capacity is 
given by 

C = max I{X; Y) 

Px 

where px denotes the input distribution. 

Here, we are interested in matrix channels, i.e., channels for which both the input and output variables 
are matrices. In particular, we are interested in a family of additive matrix channels given by the channel 
law 

Y = AX + DZ (2) 

where X,Y £ F^^™, A £ F^^", D G F^^*, Z G F*^™, and X, {A,D) and Z are statistically 

independent. Since the capacity of a matrix channel naturally scales with nm, we also define a normalized 

capacity 

C= —C. 
nm 

In the following, we assume that statistics of A, D and Z are given for all g, n, m, t. In this case, we 

may denote a matrix channel simply by the tuple {q,n,m,t), and we may also indicate this dependency 

in both C and C. We now define two limiting forms of a matrix channel (strictly speaking, of a sequence 



of matrix channels). The first form, which we call the infinite-field-size channel, is obtained by taking 
q ^ oo. The capacity of this channel is given by 

lim C{q,n,m,t) 

q^oo log2 q 

represented in g-ary units per channel use. The second form, which we call the infinite-rank channel, is 

obtained by setting t = rn and n = Am, and taking m ^ oo. The normalized capacity of this channel 

is given by 

1 _ 

lim C{q, X7n,m,TXm) 

m^oo log2 q 

represented in g-ary units per transmitted g-ary symbol. We will hereafter assume that logarithms are 
taken to the base q and omit the factor , } from the above expressions. 

Note that, to achieve the capacity of an infinite-field-size channel (similarly for an infinite-rank channel), 
one should find a two-dimensional family of codes: namely, a sequence of codes with increasing block 
length i for each q, as q ^ oo (or for each rn,, as m —> cxd). 

We will simplify our task here by considering only codes with block length i = 1, which we call 
one-shot codes. We will show, however, that these codes can achieve the capacity of both the infinite- 
field-size and the infinite-rank channels, at least for the classes of channels considered here. In other 
words, one-shot codes are asymptotically optimal as either g ^ cxd or m ^ cxd. 

For completeness, we define also two more versions of the channel: the infinite-packet-length channel, 
obtained by fixing q, t and n, and letting m ^ oo, and the infinite-batch-size channel, obtained by fixing 
q, t and m, and letting n —>■ oo. These channels are discussed in Section IVI-EI 

It is important to note that a {q,n,im,t) channel is not the same as the ^-extension of a {q,n,m,t) 
channel. For instance, the 2-extension of a {q, n, m, t) channel has channel law 

(Fi, Y2) = {AiXi + DiZi, A2X2 + D2Z2) 

where (Xi,X2) G ^^^xm'^ ^ ^^^ {Ai,Di, Zi) and {A2,D2, Z2) correspond to independent realizations 
of a {q, n, m, t) channel. This is not the same as the channel law for a (g, n, 2m, t) channel. 



Y, Yo 



A, 



X\ Xo 



+ Di 



^1 



since {A2,D2) may not be equal to {Ai,Di). 

To the best of our knowledge, the ^-extension of a {q, n, m, t) channel has not been considered in 
previous works, with the exception of [13]. For instance, [14] and [11] consider only limiting forms of 
a {q, n, m, t) channel. Although both models are referred to simply as "random linear network coding," 



the model implied by the results in [11] is in fact an infinite-rank channel, while the model implied by 
the results in [14] is an infinite-packet-length-infinite-field-size channel. 

We now proceed to investigating special cases of dD, by considering specific statistics for A, D and 
Z. 

III. The Multiplicative Matrix Channel 
We define the multiplicative matrix channel (MMC) by the channel law 

Y = AX 

where ^4 e T^xn is chosen uniformly at random among all n x n nonsingular matrices, and independently 
from X. Note that the MMC is a {q, n, m, 0) channel. 

A. Capacity and Capacity-Achieving Codes 

In order to find the capacity of this channel, we will first solve a more general problem. 

Proposition 1: Let ^ be a finite group that acts on a finite set S. Consider a channel with input variable 
X ^ S and output variable Y G S given by y = AX, where A (^ Q is drawn uniformly at random and 
independently from X. The capacity of this channel, in bits per channel use, is given by 

c = iog2 \s/g\ 

where \S/Q\ is the number of equivalence classes of S under the action of Q. Any complete set of 
representatives of the equivalence classes is a capacity-achieving code. 

Proof: For each x £ S, let Q{x) = {gx \ g £ Q} denote the orbit of x under the action of Q. Recall 
that G{y) = G{x) for all y G Q{x) and all a; G 5, that is, the orbits form equivalence classes. 

For y e Q{x), let Qx,y = {5 G ^ I ffa; = y}. By a few manipulations, it is easy to show that 
\Gx,y\ = \Gx,y'\ for all y,y' G Q{x). Since A has a uniform distribution, it follows that P[Y = y \ X = 
x] = l/\g{x)\, for all y G g{x). 

For any x G 5, consider the same channel but with the input alphabet restricted to G{x). Note that 
the output alphabet will also be restricted to g{x). This is a |^(x)|-ary channel with uniform transition 
probabilities; thus, the capacity of this channel is 0. Now, the overall channel can be considered as a sum 
(union of alphabets) of all the restricted channels. The capacity of a sum of M channels with capacities 
Ci, i = 1, . . . , M, is known to be log2 ^i-i 2*^' bits. Thus, the capacity of the overall channel is log2 M 
bits, where M = \S/g\ is the number of orbits. A capacity-achieving code (with block length 1) may 
be obtained by simply selecting one representative from each equivalence class. ■ 



Proposition [U shows that in a channel induced by a group action, where the group elements are selected 
uniformly at random, the receiver cannot distinguish between transmitted elements that belong to the same 
equivalence class. Thus, the transmitter can only communicate the choice of a particular equivalence class. 

Returning to our original problem, we have S = F"^™ and Q = Tnxn (the general linear group 
GL„(Fg)). The equivalence classes of S under the action of Q are the sets of matrices that share the 
same row space. Thus, we can identify each equivalence class with a subspace of F™ of dimension at 
most n. Let the Gaussian coefficient 



fc-i 

i=0 



')/(<?' 



denote the number of fc-dimensional subspaces of F™. We have the following corollary of Proposition [T] 
Corollary 2: The capacity of the MMC, in g-ary units per channel use, is given by 



C, 



MMC 



logg^ 



A;=0 



A capacity-achieving code C C F"^'" can be obtained by ensuring that each /c-dimensional subspace of 
F^, k <n, i& the row space of some unique X £ C. 

Note that Corollary [2] reinforces the idea introduced in [9] that, in order to communicate under random 
network coding, the transmitter should encode information in the choice of a subspace. 

We now compute the capacity for the two limiting forms of the channel, as discussed in Section |lll 
We have the following result. 

Proposition 3: Let A = n/m and assume < A < 1/2. Then 



lim Cmmc = {m — n)n 



lim C'mmc 

m — *oo 
n=Am 



Proof: First, observe that 



m 
n* 



n 



1-A. 



< (n + 1) 



9 fc=0 

where n* = min{n, [m/2j}. Using the fact that [9] 



m 
n* 



(m—k)k 



< 



< Aq 



(m—k)k 



it follows that 



(m — n*)n* < Cmmc < {m — n*)n* + log^ 4(n + 1) 



(3) 
(4) 

(5) 

(6) 

(7) 



The last term on the right vanishes on both limiting cases. 



The case A > 1/2 can also be readily obtained but is less interesting since, in practice, the packet 
length m will be much larger than the number of packets n. 

Note that an expression similar to (|7]) has been found in [13] under a different assumption on the 
transfer matrix (namely, that A is uniform on F^^"). It is interesting to note that, also in that case, the 
same conclusion can be reached about the sufficiency of transmitting subspaces [13]. 

An intuitive way to interpret © is the following: out of the nm symbols obtained by the receiver, n^ 
of these symbols are used to describe A, while the remaining ones are used to communicate X. 

It is interesting to note that Q precisely matches ^ after normalizing by the total number of transmitted 
symbols, nm. 

Both limiting capacity expressions (|3]l and dUl can be achieved using a simple coding scheme where 
an n X {m — n) data matrix U is concatenated on the left with an n x n identity matrix /, yielding a 



transmitted matrix X = I JJ ■ The first n symbols of each transmitted packet may be interpreted as 
pilot symbols used to perform "channel sounding". Note that this is simply the standard way of using 
random network coding [15]. 

IV. The Additive Matrix Channel 
We define the additive matrix channel (AMC) according to 

Y = X + W 

where W G Tnxm,t is chosen uniformly at random among all n x m matrices of rank t, independently 
from X. Note that the AMC is a {q, n, m, t) channel with D E Tnxt and Z G T^xm uniformly distributed, 
and A = I. 

A. Capacity 

The capacity of the AMC is computed in the next proposition. 
Proposition 4: The capacity of the AMC is given by 

Camc = nm- log^ \Tnxm.,t\- 
For A = n/m and r = t/n, we have the limiting expressions 

Yvca CpMC = {'m-t){n-t) (8) 



Jim^C7AMC = (l-Ar)(l-T). (9) 

n=\m 
t=Tn 



Proof: To compute the capacity, we expand the mutual information 

I{X; Y) = H{Y) - H{Y\X) = H{Y) - H{W) 

where the last equality holds because X and W are independent. Note that H{Y) < nm, and the 
maximum is achieved when Y is uniform. Since H{W) does not depend on the input distribution, we 
can maximize H{Y) by choosing, e.g., a uniform px- 

The entropy of W is given by H{W) = log^ |7^xm,t|- The number of n x m matrices of rank t is 
given by [16, p. 455] 



IT WT I 

irr I Mnxtl I -'txml irr I 

l-'nxm.jil — [7^^ j — l-'nxil 

Mixtl 

t-1 



(10) 



^(..^-,.jj (i-.-2(W--) _ (11) 



j=0 ' ^ 



Thus, 



Camc = nm- log |7; 



^nxm,t\ 



t-1 



(m-t)(n-t)+loggJJ — 



:i-'z 



i-t\ 



The limiting expressions dSjl and (|9ll follow immediately from the equation above. ■ 

Remark: The expression Q, which gives the capacity of the infinite -rank AMC, has been previously 
obtained in [11] for a channel that is equivalent to the AMC. Our proof is a simple extension of the 
proof in [11]. 

As can be seen from (fTTI ). an nxm matrix of rank t can be specified with approximately (n + m, — t)t 
symbols. Thus, the capacity ^ can be interpreted as the number of symbols conveyed by Y minus the 
number of symbols needed to describe W. 

Note that, as in Section |llll the normalized capacities of the infinite-field-size AMC and the infinite- 
rank AMC are the same. An intuitive explanation might be the fact that, for the two channels, both the 
number of bits per row and the number of bits per column tend to infinity. In contrast, the normalized 
capacity is different when only one of these quantities grows while the other is fixed. This is the case of 
the infinite -packet-length AMC and the infinite-batch-size AMC, which are studied in Section IVI-EI 

B. A Coding Scheme 

We now present an efficient coding scheme that achieves ([8]) and ©. The scheme is based on an "error 
trapping" strategy. 
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Let [/ G Fg be a data matrix, where v > t. A codeword X is formed by adding all-zero 



rows and columns to U so that 



X 







(n—v)xv 



vx(m—v) 

u 



These all-zero rows and columns may be interpreted as the "error traps." Clearly, the rate of this scheme 

is i? = (n — v){m — v). 

Since the noise matrix W has rank t, we can write it as 

Bi 



W = BZ 



Bo 



Z\ Z2 



^ , Z\ ^ jptxf ^j^^ 2,2 G Fg^ '" . The received matrix Y is then given 



Y = X^W 



where B^ G F^^*, B2 G Fg 

by 

B\ Z\ B\ Z2 

B2Z1 U + B2Z2 

We define an error trapping failure to be the event that rank i^iZi < t. Intuitively, this corresponds to 
the situation where either the row space or the column space of the error matrix has not been "trapped". 

For now, assume that the error trapping is successful, i.e., ranki?i = rankZi = t. Consider the 
submatrix corresponding to the first v columns of Y. Since rankSiZi = t, the rows of B2Z1 are 
completely spanned by the rows of BiZi. Thus, there exists some matrix T such that B2Z1 = TBiZi. 
But {B2 - TBijZi = impUes that B2 - TBi = 0, since Zi has full row rank. It follows that 



T 


Bi 




Bi 




B2_ 








where T 



I 
f I 



Note also that TX = X. Thus, 



TY = TX + TW 



BiZi B1Z2 
U 



from which the data matrix U can be readily obtained. 

The complexity of the scheme is computed as follows. In order to obtain T, it suffices to perform 
Gaussian elimination on the left n x v submatrix of Y, for a cost of 0{nv'^) operations. The data 
matrix can be extracted by multiplying T with the top right v x {n — v) submatrix of Y, which can be 
accomplished in 0{{n — v)v{m — v)) operations. Thus, the overall complexity of the scheme is 0{nmv) 
operations in F^. 

Note that BiZi is available at the receiver as the top-left submatrix of Y. Moreover, the rank of BiZi 
is already computed during the Gaussian elimination step of the decoding. Thus, the event that the error 
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trapping fails can be readily detected at the receiver, which can then declare a decoding failure. It follows 
that the error probability of the scheme is zero. 

Let us now compute the probability of decoding failure. Consider, for instance. Pi = P[rank Zi = t], 



where Z 



Z, 



is a full-rank matrix chosen uniformly at random. An equivalent way of generating 



Z is to first generate the entries of a matrix M G F*^"* uniformly at random, and then discard M if it 
is not full-rank. Thus, we want to compute P\ = P[rank M\ = t \ rank M = t], where Mi corresponds 
to the first v columns of M. This probability is 

P[rank Mi = t] _ q""^ UlZli.Q'' - Q') 



P 



t 



> Ylil - q'-"") > (1 - (7*"l-^)* > 1 



The same analysis holds for P2 = P[rank i?i = t\. By the union bound, it follows that the probability 
of failure satisfies 

Pf < -qv^f (12) 

Proposition 5: The coding scheme described above can achieve both capacity expressions dUl and ([9]). 

Proof: From (IT2l ). we see that achieving either of the Umiting capacities amounts to setting a suitable 

V. To achieve ([8]l, we set v = t and let q grow. The resulting code will have the correct rate, namely, 

R = {n — t){m — t) in g-ary units, while the probability of failure will decrease exponentially with the 

field size in bits. 

Alternatively, to achieve ^, we can choose some small e > and set v = {t + e)n, where both 
r = t/n and A = n/m are assumed fixed. By letting m grow, we obtain a probability of failure that 
decreases exponentially with m. The (normalized) gap to capacity of the resulting code will be 



g = lim Camc - R/{nm) 

rra-^oo 

= (1 _ Ar)(l - r) - (1 - A(t + e))(l - (r + e)) 
= Ae(l-(T + e))+e(l-Ar) 
< Ae + e = (1 + A)e 
which can be made as small as we wish. 
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V. The Additive-Multiplicative Matrix Channel 

Consider a {q,n,m,t) channel with A G Tnxn, D G T^xt and Z G Ttxm uniformly distributed and 
independent from other variables. Since A is invertible, we can rewrite Q as 

Y = AX + DZ = A{X + A-^DZ). (13) 

Now, since Tnxn acts transitively on Tnxt, the channel law (|T3] ) is equivalent to 

y = A{X + W) (14) 

where A ^ Tnxn and VF G Tnxm,t are chosen uniformly at random and independently from any other 
variables. We call (1141 ) the additive-multiplicative matrix channel (AMMC). 

A. Capacity 

One of the main results of this section is the following theorem, which provides an upper bound on 
the capacity of the AMMC. 

Theorem 6: For n < m/2, the capacity of the AMMC is upper bounded by 

Cammc < {m-n){n-t)+\ogg4{l + n){l + t). 

Proof: Let S = X + W. By expanding I{X, S;Y), and using the fact that X, S and Y form a 
Markov chain, in that order, we have 

I{X; Y) = I{S; Y) - I{S; Y\X) + I{X; Y\S) 

^ V ' 

=0 

= I{S;Y) - I{W;Y\X) 

= I(S; Y) - H{W\X) + H{W\X, Y) 

= IiS;Y)-H{W) + H{W\X,Y) (15) 

< CmmC - logg \TnxmA + H{W\X, Y) (16) 

where (fTSl ) follows since X and W are independent. 

We now compute an upper bound on H(W\X,Y). Let R = rankY and write Y = GY, where 
G G TnxR and Y G TRxm- Note that 

X + W = A-^Y = A-^GY = A*Y 

where A* = A~^G. Since Y is full-rank, it must contain an invertible R x R submatrix. By reordering 



columns if necessary, assume that the left R x R submatrix of Y is invertible. Write Y 



Y Yo 
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X 



X^ Xo 



and W 



Wi W2 



, where Yi, Xi and Wi have R columns, and Y2, X2 and W2 



have m — R columns. We have 

A* = {Xi + Wi)Y^^ and W2 = A*Y2-X2. 

It follows that W2 can be computed if Wi is known. Thus, 

H{W\X,Y) = H{Wi\X,Y) < H{Wi\R) <H{Wi\R = n) 
t 

- ■^°Sg X] \%ixn,i\ < log^lt + l)\Tnxn,t\ 



(17) 



i=0 



where (fTTl ) follows since l^i may possibly be any n x n matrix with rank < t. 
Applying this result in (IT6l) . and using ^ and (ITOl) . we have 



IT If"! 
+ log,(t + l)p— ^ 



m — t 
n — t 



(18) 



/(X,y)<log,(n + l) 

<log<j(n + l)(t + l) 

< (m-n)(n-t) +log^4(l + n)(l + i). 

where d foUows from ['^] [^] = [7] [™r*] , for t < n < m. ■ 

We now develop a connection with the subspace approach of [9] that will be useful to obtain a lower 
bound on the capacity. From Section [nil we know that, in a multiplicative matrix channel, the receiver 
can only distinguish between transmitted subspaces. Thus, we can equivalently express 

Cammc = max I{X] y) 

px 

where X and y denote the row spaces of X and Y, respectively. 

Using this interpretation, we can obtain the following lower bound on capacity. 
Theorem 7: Assume n < m. For any e > 0, we have 

2/77,777, 

Cammc > {m - n){n - t - et) - log^ 4 j^^. 

In order to prove Theorem |7J we need a few lemmas. 
Lemma 8: Let X £ F^^" be a matrix of rank k, and let W £ F^^™ 
uniformly among all matrices of rank t. li k + t < min{n, m}, then 

2t 



be a random matrix chosen 



P[rank{X + W) < k + t] < 



q 



min{n,m}— fc— t+l ' 
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Proof: Write X = X'X", where X' £ F^^'^ and X" £ F^^™ are full-rank matrices. We can 
generate W as W = WW", where W' e %ixt and W" G Ttxm are chosen uniformly at random and 
independently from each other. Then we have 

^X" 



X + W = X'X" + WW" 



X' w 



W" 



Note that rank(X + W) = k + t if and only if the column spaces of X' and W' intersect trivially and 
the row spaces of X" and W" intersect trivially. Let P' and P" denote the probabilities of these two 
events, respectively. By a simple counting argument, we have 



k\ f „n ^k+t—l\ (^ r,k—n+i\ 






((?" - 1) • • • (g" - (? 



f-i 



> 



n(i 

j=0 



fc-n+i-j > /I _ fe— n+t-l\t > 1 _ 4.„k-n+t-l 



Similarly, we have P" > 1 - tg'=-'"+*-i. Thus, 

P[rank(X + Ty) < A; + i] < 

< 



t 



+ 



t 



-.n—k—t+l fjin—k—t+l 

2t 

~.m\n{n,rn} —k—t+l ' 



For dim A" < n < m, let Sx,n denote the set of all n-dimensional subspaces of F^ that contain a 
subspace X C F^. 
Lemma 9: 

\Sx,n\ 



m — k 
n — k 



where k = dim X. 

Proof: By the fourth isomorphism theorem [17], there is a bijection between Sx,n and the set of 
all (n — A;) -dimensional subspaces of the quotient space F^'^/X. Since dim ¥^/X = m — k, the result 
follows. ■ 

We can now give a proof of Theorem |7] 

Proof of Theorem^ Assume that X is selected from Tnxm,k^ where k = n — {I + e)t and e > 0. 
Define a random variable Q as 



Q= { 



1 if d\my = rank{X + W) = k + t 
otherwise. 
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Note that X <Z y when Q = I. 
By Lemma |9] and Q, we have 

Hiy\X, Q = l)< logg \Sx,n'\ < (m - n')t + log^ 4 

where n' = k + t. Choosing X uniformly from Tnxin,k, we can also make y uniform within a given 
dimension; in particular, 

Hiy\Q = l) = logg 



> (m — n)n 



It follows that 

I{X;y\Q = 1) = H{y\Q = 1) - H{y\X, Q = l) 

> {m — n'){n' — t) — logg 4 

> (m — n){n — t — et) — log^ 4. 
Now, using Lemma [H we obtain 

I{X; y) = I{X- y, Q) = I{X- Q) + I{X; y\Q) 
>I{X;y\Q) 
>P[Q = l]I{X-y\Q = l) 

> I{X;y\Q = I) - P[Q = Q]nm 

2tnm 

> [m- n){n - t - et) - log^4 ^^^p^. 

■ 
Note that, differently from the results of previous sections. Theorems [6] and |7] provide only upper and 

lower bounds on the channel capacity. Nevertheless, it is still possible to compute exact expressions for 

the capacity of the AMMC in certain limiting cases. 

Corollary 10: For < A = n/m < 1/2 and r = t/n, we have 

lim Cammc = {m-n){n- t) (19) 



lim Cammc = (1-A)(1-t). (20) 

m — >OQ 
n=Xm 
t=Tn 

Proof: The fact that the values in (|T9l ) and ( [201 ) are upper bounds follows immediately from 
Theorem |6] The fact that (|T9l ) is a lower bound follows immediately from Theorem |7] by setting e = 0. 
To obtain (|20l ) from Theorem |7J it suffices to choose e such that 1/e grows sublinearly with m, e.g., 

e = Xj^fra. ■ 
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Once again, note that ( fT9l ) agrees with (l20l ) if we consider the normaUzed capacity. 

Differently from the MMC and the AMC, successful decoding in the AMMC does not (necessarily) 
allow recovery of all sources of channel uncertainty — in this case, the matrices A and W. In general, for 
every observable {X, Y) pair, there are many valid A and W such that Y = A{X + W). Such coupling 
between A and W is reflected in extra term H{W\X,Y) in ( fTSl ). which provides an additional rate of 
roughly (2n — t)t as compared to the straightforward lower bound Cammc > Cmmc — logg \Tnxm,t\ ~ 
(m — n)n — (n + m — t)t. 

Remark: In [11], the problem of communicating over the AMMC was addressed assuming a specific 
form of transmission matrices that contained an n x n identity header. Note that, if we consider such 
a header as being part of the channel (i.e., beyond the control of the code designer) then, with high 
probability, the resulting channel becomes equivalent to the AMC (see [11] for details). However, as 
a coding strategy for the AMMC, using such an n x n identity header results in a suboptimal input 
distribution, as the mutual information achieved is strictly smaller than the capacity. Indeed, the capacity- 
achieving distribution used in Theorem |7] and Corollary \T0\ corresponds to transmission matrices of rank 
n—{l+e)t. This result shows that, for the AMMC, using headers is neither fundamental nor asymptotically 
optimal. 



B. A Coding Scheme 

We now propose an efficient coding scheme that can asymptotically achieve ( [T9l ) and (l20l ). The scheme 
is based on a combination of channel sounding and error trapping strategies. 



For a data matrix [/ G F, 



X 



(n—v) X {m—n) 


X 



, where v >t, let the corresponding codeword be 



0^> 



0, 



vx{n—v) 







'^(n—v)xv -'(n— D)x(n— tj) 



11 x(m—n) 

u 



Note that the all-zero matrices provide the error traps, while the identity matrix corresponds to the pilot 
symbols. Clearly, the rate of this scheme is R = {n — v){m — n). 
Write the noise matrix W as 

'^1 



W = BZ 



Bo 



Z\ Zo Z,s 



•vxt R. ^ T^in-v)xt ^ ^ ^txv 7. ^ ^tx{n-v) ^^^ ^^ ^ ^txim-n) _ ^^^ auxiliary matrix 



where Bi e F^^*, B2 G F^" "'^\ Zi G F*><^ Z2 G ¥g 
S is then given by 

BiZi B1Z2 



s = x + w 



BiZs 
B2Z1 I + -62-^2 U + B2Z3 
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Similarly as in Section |Wl we define that the error trapping is successful if rank BiZi = t. Assume 
that this is the case. From Section |lVl there exists some matrix T € Tnxn such that 



TS 



BiZi B1Z2 BiZ-i 
I U 



Bi 
/ 



Zi Z2 Z3 
I u 



Note further that 



RRE 



Zi Z2 Z3 
I u 



for some Zi G F*^" in RRE form and some Z3 G F 



RRE (5) = RRE 



tx{m—n) 




Zi Z3 
I u 

. It follows that 



Zi Z2 z^ 
I u 



Zi Z3 
I u 

where the bottom v — t rows are all-zeros. 

Since A is invertible, we have RRE (y) = RRE(S'), from which U can be readily obtained. Thus, 
decoding amounts to performing Gauss-Jordan elimination on Y. It follows that the complexity of the 
scheme is 0{n'^m) operations in Fg. 

The probability that the error trapping is not successful, i.e., rank BiZi < t, was computed in 
Section |lVl Let A correspond to the first n columns of Y. Note that ranki?iZi = t if and only if 
rank A = n — V + t. Thus, when the error trapping is not successful, the receiver can easily detect this 
event by looking at RRE (Y) and then declare a decoding failure. It follows that the scheme has zero 
error probability and probability of failure given by (fT2l ). 

Theorem 11: The proposed coding scheme can asymptotically achieve ( fT9l ) and (l20b . 

Proof: Using (fT2l) and the same argument as in the proof of Proposition |5l we can set a suitable 
V in order to achieve arbitrarily low gap to capacity while maintaining an arbitrary low probability of 
failure, for both cases where g^ooorm^oo. ■ 



VI. Extensions 

In this section, we discuss possible extensions of the results and models presented in the previous 
sections. 
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A. Dependent Transfer Matrices 

As discussed in Section IVl the AMMC is equivalent to a channel of the form Q where A ^ Tnxn 
and D ^ Tnxt are chosen uniformly at random and independently from each other. Suppose now that 
the channel is the same, except for the fact that A and D are not independent. It should be clear that 
the capacity of the channel cannot be smaller than that of the AMMC. For instance, one can always 
convert this channel into an AMMC by employing randomization at the source. (This is, in fact, a 
natural procedure in any random network coding system.) Let X = TX', where T G T^xn is chosen 
uniformly at random and independent from any other variables. Then A' = AT is uniform on Tnxn and 
independent from D. Thus, the channel given by 1" = A'X' + DZ is an AMMC. 

Note that our coding scheme does not rely on any particular statistics of A given X and W (except 
the assumption that A is invertible) and therefore works unchanged in this case. 

B. Transfer Matrix Invertible but Nonuniform 

The model for the AMMC assumes that the transfer matrix A S T^xn is chosen uniformly at random. 
In a realistic network coding system, the transfer matrix may be a function of both the network code and 
the network topology, and therefore may not have a uniform distribution. Consider the case where A is 
chosen according to an arbitrary probability distribution on Tnxn- It should be clear that the capacity can 
only increase as compared with the AMMC, since less "randomness" is introduced in the channel. The 
best possible situation is to have a constant A, in which case the channel becomes exactly an AMC. 

Again, note that our coding scheme for the AMMC is still applicable in this case. 

C. Nonuniform Packet Errors 

When expressed in the form ([2]), the models for both the AMC and the AMMC assume that the matrix 
Z is uniformly distributed on T^xt- In particular, each error packet is uniformly distributed on Fg^"'\{0}. 
In a realistic situation, however, it may be the case that error packets of low weight are more likely to 
occur. Consider a model identical to the AMC or the AMMC except for the fact that the matrix Z is 
chosen according to an arbitrary probability distribution on Ttxm- Once again, it should be clear that 
the capacity can only increase. Note that the exact capacity in Proposition |4] and the upper bound of 
Theorem [6] can be easily modified to account for this case (by replacing log^ \Tnxm,t\ with the entropy 
of T^). 

Although our coding scheme in principle does not hold in this more general case, we can easily convert 
the channel into an AMC or AMMC by applying a random transformation at the source (and its inverse 
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at the destination). Let X = X'T, where T € %nxm is chosen uniformly at random and independent 
from any other variables. Then 

Y' = YT-^ = {AX + DZ)T-^ = AX' + DZ' 

where Z' = ZT~^. Since Tmxm acts (by right multiplication) transitively on Ttxm, we have that Z' is 
uniform on Ttxm- Thus, we obtain precisely an AMMC (or AMC) and the assumptions of our coding 
scheme hold. 

Note, however, that, depending on the error model, the capacity may be much larger than what can be 
achieved by the scheme described above. For instance, if the rows of Z are constrained to have weight at 
most s (otherwise chosen, say, uniformly at random), then the capacity would increase by approximately 
[m — s — log„ (™)) t, which might be a substantial amount if s is small. 

D. Error Matrix with Variable Rank (< t) 

The model we considered for the AMC and the AMMC assumes an error matrix W whose rank is 
known and equal to t. It is useful to consider the case where rankiy is allowed to vary, while still 
bounded by t. More precisely, we assume that W is chosen uniformly at random from Tnxm,R, where 
i? G {0, . . . , t} is a random variable with probability distribution P[R = r] = pr. 

Since 

H{W) = H{W, R) = H{R) + H{W\R) 
= H{R) + ^prH{W\R = r) 

r 

= H{R) + y^^Pr^Ogq\Tnxm,r\ 
r 

< H{R) +logq|7;xm,t|, 

we conclude that the capacities of the AMC and the AMMC may be reduced by at most H{R) < 
logg(t + 1). This loss is asymptotically negligible for large q and/or large m, so the expressions ([8]l, I©, 
(|T9l ) and (l20l ) remain unchanged. 

The steps for decoding and computing the probability of error trapping failure also remain the same, 
provided we replace t by R. The only difference is that now decoding errors may occur. More precisely, 
suppose that rank i?iZi = t' < t. A necessary condition for success is that rank i^iZ = rank BZi = t' . 
If this condition is not satisfied, then a decoding failure is declared. However, if the condition is true, 
then the decoder cannot determine whether t' = R < t (an error trapping success) or t' < R < t (an 



20 

error trapping failure), and must proceed assuming the former case. If the latter case turns out to be 
true, we would have an undetected error Thus, for this model, the expression (fT2l) gives a bound on the 
probability that decoding is not successful, i.e., that either an error or a failure occurs. 

E. Infinite-Packet-Length Channel and Infinite-Batch-Size Channel 

We now extend our results to the infinite -packet-length AMC and AMMC and the infinite -batch-size 
AMC. (Note that, as pointed out in Section |llll there is little justification to consider an infinite-batch-size 
AMMC.) From the proof of Propositon |4] and the proof of Corollary [TOl it is straightforward to see that 



lim Cammc = lim Camc = {n - t)/n 



lim Camc = (m - t)/m. 

n— >oo 

It is not straightforward, however, to obtain capacity-achieving schemes for these channels. The schemes 
described in Sections |IV] and |V] for the infinite -rank AMC and AMMC, respectively, use an error trap 
whose size (in terms of columns and rows) grows proportionally with m (or n). While this is necessary 
for achieving vanishingly small error probability, it also implies that these schemes are not suitable for 
the infinite-packet-length channel (where ttt, ^ oo but not n) or the infinite-batch-size channel (where 
n — > oo but not m). 

In these situations, the proposed schemes can be adapted by replacing the data matrix and part of 
the error trap with a maximum-rank-distance (MRD) code [18]. Consider first an rnfinite-packet-length 
AMC. Let the transmitted matrix be given by 



X 



0„ 



(21) 



where x G Fg^ is a codeword of a matrix code C. If (column) error trapping is successful then, 

under the terminology of [10], the decoding problem for C amounts to the correction of t erasures. It 
is known that, fov m — v > n, an MRD code C C ]p^^*^»" "") ^j^j^ j.^jg ^^ _ f^yji can correct exactly t 
erasures (with zero probability of error) [10]. Thus, decoding fails if and only if column trapping fails. 
Similarly, for an infinite-batch-size AMC, let the transmitted matrix be given by 

X ^°' 

where x G Fg '"i^^ j^ ^ codeword of a matrix code C. If (row) error trapping is successful then, under 
the terminology of [10], the decoding problem for C amounts to the correction of t deviations. It is known 



21 



that, for n — V > m, an MRD code C C Wq ^'^"^ ^vith rate (m — t)/m can correct exactly t deviations 
(with zero probability of error) [10]. Thus, decoding fails if and only if row trapping fails. 

Finally, for the infinite-packet-length AMMC, it is sufficient to prepend to (|2TI) an identity matrix, i.e.. 



X 



J-nxn ^nxv X 



The same reasoning as for the infinite-packet-length AMC applies here, and the decoder in [10] is also 
applicable in this case. 

For more details on the decoding of an MRD code combined with an error trap, we refer the reader 
to [19]. The decoding complexity is in 0{tv?m) and 0{tm?n) (whichever is smaller) [10]. 

In all cases, the schemes have probability of error upper bounded by t/q^^'"^^ and therefore are 
capacity-achieving. 

VII. Conclusions 

We have considered the problem of reliable communication over certain additive matrix channels 
inspired by network coding. These channels provide a reasonable model for both coherent and random 
network coding systems subject to random packet errors. In particular, for an additive-multiplicative 
matrix channel, we have obtained upper and lower bounds on capacity for any channel parameters and 
asymptotic capacity expressions in the limit of large field size and/or large matrix size; roughly speaking, 
we need to use t redundant packets in order to be able to correct up to t injected error packets. We have 
also presented a simple coding scheme that achieves capacity in these limiting cases while requiring a 
significantly low decoding complexity; in fact, decoding amounts simply to performing Gauss-Jordan 
elimination, which is already the standard decoding procedure for random network coding. Compared 
to previous work on correction of adversarial errors (where approximately 2t redundant packets are 
required), the results of this paper show an improvement of t redundant packets that can be used to 
transport data, if errors occur according to a probabilistic model. 

Several questions remain open and may serve as an interesting avenue for future research: 
• Our results for the AMMC assume that the transfer matrix A is always nonsingular. It may be useful 
to consider a model where rank A is a random variable. Note that, in this case, one cannot expect 
to achieve reliable (and efficient) communication with a one-shot code, as the channel reaUzation 
would be unknown at the transmitter. Thus, in order to achieve capacity under such a model (even 
with arbitrarily large q or m), it is strictly necessary to consider multi-shot codes. 
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• As pointed out in Section IVI-CI our proposed coding scheme may not be even close to optimal 
when packet errors occur according to a nonuniform probability model. Especially in the case of 
low-weight errors, it is an important question how to approach capacity with a low-complexity 
coding scheme. It might also be interesting to know whether one-shot codes are still useful in this 
case. 

• Another important assumption of this paper is the bounded number of i < n packet errors. What 
if t is unbounded (although with a low number of errors being more likely than a high number)? 
While the capacity of such a channel may not be too hard to approximate (given the results of this 
paper), finding a low-complexity coding scheme seems a very challenging problem. 
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